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I. INTRODUCTION 



The existence of massive black holes (MBH) with masses ranging from 10 5 M Q to 10 9 M Q in majority of galactic 
centers is confirmed by several observations (see [l[ and reference therein) with the MBH in our Galaxy, SgrA* @ 
being the best example. Mergers of galaxies are common events in the Universe, it is believed that each galaxy has 
had at least one merger event during its life. During these mergers, the MBH of each galaxy is driven to the center 
of the remnant full of stars and gas via dynamical friction The pairs of MBH separated by about kiloparsec 
are observed in some active galactic nuclei such as NGC6240 H, Arp299 @, ESO509-IG066 Mrk463 0] and 
J100043. 15+020637. 2 (8|. The interaction with thegas disc can bring the binary on a tighter orbit down to a few 
parsecs in a reasonable amount of time (few Myr) [9[ . There are few candidates of MBH binaries on the sub-parsec 
scale: the quasars OJ287 [3 (~ 0.05 pc) and SDSSJ092712. 65+294344.0 [n|,[l3(~ 0.1-0.3 pc). To overcome the 
last parsec separating the MBHs and bring them to the efficient gravitational wave (GW) driven inspiral several 
scenario have been proposed. Here are few possibilities: rotation of the merging galaxies and triaxial potential fl3j . 



processes involving gas 14|, resonant relaxation [15|, massive perturber [16j , young compact stars cluster 17J, effect 



from IMBH [18|, etc. When the separation is less than 10 pc, the binary evolution is efficiently driven by the 
gravitational radiation and can reach the coalescence in less than 10 9 years. 

The GWs emitted by the binary at the end of the inspiral phase followed by the merger and the ringdown will be 
detected by the future space born mission LISA with a very high signal-to-noise ratio (SNR). The MBH binaries are 
the strongest source for LISA and several such events per year are expected [l^, [2(j ■ It is believed that almost all the 
MBHs are spinning. However the predictions for the magnitudes and directions of the spins of MBHs in the binary 
systems differ largely depending on the models, the environment of the binary and the physical processes involved 
(coherent accretion, alignment of spins with the gas disc (2l| - [23| , sequence of randomly oriented accretion events [2~i| , 
etc) . Several techniques to measure the spins using electromagnetic radiation [25|, [26| have been proposed to measure 
the spins of MBH binaries, but uncertainties are still very large. The gravitational wave observations of MBH binaries 
with LISA should enable us to measure masses and spins of MBHs in the binary with unprecedented accuracy [27| . 
The knowledge of spins could give us a lot of information about the kick velocity of remnant MBH, the engines of 
active galactic nuclei, the mechanisms involved in galactic centers, etc. Finally, the spin measurements combined 
with a precise estimation of masses and sky position made with LISA will increase our understanding of the origin of 
MBHs, the galactic evolution, the galactic center, cosmology, etc. 

Several algorithms for detecting non-spinning MBH binaries in simulated LISA data have already been demonstrated 
[28l - [3"ll | . In this paper we consider inspiralling spinning MBH binaries and we present a particular adaptation of the 
Genetic Algorithm (GA) to search for GW signals from those systems. Genetic algorithms belong to the family of 
optimization methods, i.e. they look for extrcma. The first application of GA in LISA data analysis was proposed in 
[32j for Galactic binaries. In this method the waveform template is associated with an organism, and parameters play 
the role of the set of genes defining this organism. The logarithm of likelihood obtained with a given template defines 
the quality of the organism. A set (colony) of organisms is then evolved through breeding, mutation and custom 
designed accelerators with the aim of finding the genotype with the highest quality. This corresponds to the standard 
Darwin's principle: weak perishes, strong survives, or, translated into the conventional data analysis language: by 
evolving a set of templates, we are searching for the parameter set that maximizes the likelihood. 

We have applied the GA to the analysis of the third round of mock LISA data challenge. The mock data set 3.2 
consisted of the Gaussian instrumental noise, Galactic background and between four to six signals from the inspiralling 
spinning MBH binaries in a quasi-circular orbit HU. We have found several almost equal in value maxima of the 
likelihood which are widely separated in the parameter space. We search for each such strong maximum, which we 
call mode, and then explore it by a designated set of organisms. We refer to this extension of the standard GA as 
a multimodal GA. The mutlimodal GA applied to the blind search has shown an excellent performance: we have 
detected all present signals with a very accurate estimation of the parameters (were possible). 

The structure of this paper is as follows. In the next Section |TT] we describe a model of the signal and the search 
template we utilized. Then we discuss the detection statistics we associate with the quality of the organism and their 
efficient maximization over a subset of parameters. In Section UHl wc introduce a Genetic Algorithm and its particular 
implementation in GW data analysis. Then, in the Section IIVI we introduce accelerators for the rapid evolution - 
mechanisms which allow efficiently explore the parameter space. Besides few standard accelerators often used in GA 
we have introduced few new ones, specific to the MBH binary search problem. We introduce multimodal GA (MGA) 
in Section [V] and describe how to identify the modes and to follow their evolution. Our complete algorithm as part 
of the search pipeline is presented in Section PVTl which describes the search pipeline. The results are presented and 
discussed in the Section IVIII and finally we give a short summary in the concluding Section I Villi 
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II. FORMULATION OF THE PROBLEM 

The mock LISA data challenges are organized in order to encourage the development of efficient algorithms for 
gravitational wave data analysis and to evaluate their performance. The third round of MLDC consisted of five 
challenges but in this work wc focus our attention here on the data set 3.2 which contained GW signals from 4-6 
binaries of spinning MBHs (exact number was not revealed to the participants), on top of the confusion Galactic 
binaries background and the instrumental noise. These data was an improvement upon the MLDC challenges 1.2 and 
2.2 by adding spins to MBHs. The spin-spin and spin-orbital coupling causes the orbital and spins precession which 
results in the modulation of the amplitude and phase of the GW signal. The prior range on the parameters and the 
detailed set up of the challenge can be found in (33|. 



A. Model of the template 



The signal used in MLDC is modeled as the amplitude-restricted waveform (i.e. only dominant harmonic at the 
leading order is used) with the phase taken up to the second Post-Newtonian (PN) order with the leading order 
contributions from the spin-orbital and spin-spin coupling. The binary evolution is described as a quasi-circular 
adiabatic inspiral. 

The waveform is described by fifteen parameters which are: the two masses m\ and m 2 , the time at coalescence t c , 
the sky location of the source in ecliptic coordinates, latitude /J and longitude A, the dimensionless spin parameters, 
Xi and X2, the initial direction of the spins, polar angles 6s 1 and 6s 2 and azimuthal angles <ps 1 and 0s 2 , the initial 
direction of the orbital angular momentum, polar angle 9^ and azimuthal angle </>£, the phase at coalescence <fr c , and 
the luminosity distance D^. 

We denote the unit vector in the direction of the orbital angular momentum as L and two spins are Si = ximfSi, 
S 2 = X2fn 2 S 2 , where Si ,2 are unit vectors and < X1.2 < 1. The precession equations are given in 34 1 
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The modulation of the waveform due to the presence of spins is taken at the leading order. 
The orbital angular frequency with spin effect up to 2 PN order is given by 
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In our following analysis, we use 77 and the chirp mass M c = fe^rp as independent parameters instead of m\ 

and m,2- 

The intrinsic phase is 
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The phase is defined with respect to the ascending node, however the spin-orbital coupling causes precession of the 
orbit, therefore we need to introduce precessional correction to the phase, It depends on the unit vector n 

pointing from the solar system barycenter to the source: 

$(*) =*orb(t) +<**(*), (9) 

• . . „ • (L ■ h)(L x n) ■ L . . 

$(t)=«> + 8* = u+ K , } \ f ' , (10) 

1 — (L ■ n) z 
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/* \l-(i-n) 

The gravitational wave polarization components in the source frame are given by 

h+ = /i +0 cos2$ = -^^(1 + cos 2 t)(Afw) 2/3 cos2$ 

h x = /i x0 sin2$ = ^^cost(Afw) 2/3 sin2$, (12) 

where cos 1 = L ■ h. 

The strain that the GW exerts on the LISA detector is the following linear combination of h + and h x 

h(t) = F+(f3 7 \)h s + (t) + F x ((3, X)h s x (t), (13) 

where F + and F x are "detector beam-pattern" coefficients. The polarization components in the radiation frame, h+ 
and h x , are expressed as 

h+ = —h + cos 2iJj — h x sin 2tp, 

h x = h + sin 2ip - h x cos 2ip, (14) 
where the polarization angle ^ is defined by 

sin/3cos(A — 6 A sin6>L — cos6*l cos/3 . , 

tanV= — — ; 7t • (15) 

sint/i sm(A — 4>l) 

Note, due to the precession of the orbital plane, this polarization angle varies during the evolution of the binary. 

The data sets in MLDC are the TDI (time delay interferometry) variables. These are the combinations of the time 
delayed measurements which drastically reduce the laser frequency noise (35l . l36j . In our search, we adopted the two 
orthogonal (i.e. with uncorrelated noise) TDI channels, A and E ; in the phase domain (i.e. strain). In our template, 
we consider a long wavelength approximation to these signals (37l.[38j. This approximation (Luj -C 1, where L is LISA' 
armlength and u is an instantaneous frequency of GW) works pretty well below approximately 5 mHz. Assuming 
rigid LISA with equal arms, the waveform sampled at discrete times takes the following form [33, HH 

hi{t k ) ~ 2L sinA</) 2L (^.) x 

{-h+o(t k ) [cos(2ip(t k ))F +I (t) - sm(2^(t k ))F XI (t)] sin0'(t fc ) 

+h x0 {t k )[sm(2^(t k ))F +I (t) + cos(2iP(t k ))F XI {t)}cos(b'(t k )} , (16) 
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where I = {A, E}, A(j>2 L (t) = {<t>(t k ) - 4>{t k - 2L))/2, <j>'{t) = (<p{t k ) + 4>{t k - 2L))/2 with <j){t) being the phase of 
GW and t k = t + n.ro is the time in LISA frame with Yq the vector from the Sun to LISA bary center. The antenna 
pattern functions F+i and F x j corresponding to the TDI channel, have the following expressions 

F+(6 d , \ d ; t,n) = ^ [6 sin(20 d ) (3 sin ($ T (t) + A d + Q) - sin (3*r(t) - X d + fi)) 
-18V3sin 2 6 d sin (2$ T (t) + fi) - \/3 (l + cos 2 6> d ) x 

(sin (4$ T (<) - 2A rf + fi) + 9 sin (2A rf + Q))] , (17) 
F x {8 d ,\ d ;t,n) = [V3cos6» (i (cos(4$T(t)-2A d + 0)- 

9cos(2A d + ft)) +6sin6» d (cos(3$ T (i) - A d + f2)+ 

3cos($ T (i) + A d + ft))] (18) 

with 9 d — (5 + 7r/2, Ad = A + 7t, = lnt/Year and f2 = 0, — 7r/2 for channels A and E respectively. Note that 

this is the long wavelength approximation to the signal injected in the simulate data, we found it to be a reasonably 
accurate representation until the last 1 — 1/2 cycles before the merger. The end of the signal is discussed in more 
detail later. 

B. quality estimation 



The signal from one detector is 



s i (t) = h i (t,6) + n i (t), (19) 



where hi(t,9) is a signal described by a set of parameters 9 and rij(i) is the stationary Gaussian noise characterized 
by the power spectral density (PSD) S n (f), 



Sn(f)S(f-f) = 2h(f)h(f). 

Here the over-bar means the average over ensemble of the noise realizations and the tilde denotes the Fourier transform 
defined as 

/>oo 

n(f) = dt n(t)e 2wif . (20) 



We use the maximum likelihood method (ML) |39h41| for estimating the parameters of the waveform 9. Assuming 
that the noise n(t) is a normal process with zero mean, then the likelihood (probability) of the presence of signal h(9) 
in the detector output is given by 

P (s\9) = Ge-^WIs-MW^ (21) 
where (h \ s) is the inner product of the signal given by 

(h\») = 2j o df ^ . (22) 

The noise is the sum of instrumental noise S!^ st '(f) and the GW confusion noise from Galactic binaries S„ Bln ' (/). 
In strain data (i.e. phase measurements) , the instrumental noise for TDI variables A and E is described by the following 
PSD 

Sir-(f) = 16sin 2 (0 L ) (S° PN (/) + (cos(20 L ))Sr-(/)) 

-4sin(20 L ) sin(^) (4S° PN (/) + S^(f)) , (23) 

where the acceleration noise is 

Sn°if) = 5 - 75 x 10- 53 (/~ 4 + l(T 8 /- 6 ) Hz' 1 and the optical path noise and the shot 
noise are S° PN (/) = 3.675 x 1CT 42 Hz' 1 . 

The Galactic GW confusion noise is a combination of the unresolved si gnal s from ~ 30 millions of white dwarf 
binaries. This noise is modeled by the following function, in units of Hz -1 , |42l. l43l| 

10 -44. 62/ -2.3 1Q -4 < f < 10-3 
5in . m _ , 10-50.02/-4.4 10-3 < f < lQ-2.7 

KJ)-\ 10 -62.8y-8.8 lO- 2 ' 7 < / < 10~ 2 - 4 [ ' 

10" 89 - 68 /" 20 lO" 2 ' 4 < / < io~ 2 
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1. Maximization of the likelihood 

The goal of the method presented in this paper is to find the maximum of the likelihood in the 15-dimensional 
parameter space, and, thus, obtain the maximum likelihood estimation of the parameters of the signal. The value of 
the likelihood tells us also about the statistical significance of the detected event. In case of LISA data, the signals 
usually have high signal-to-noise ratio (SNR), so the probability of the false detection is rather low. However, the 
data is signal dominated and several GW signals of one type (say, Galactic binaries) could conspire and produce 
significa ntly high SNR at the output of the matched filtering during the search for another type of signal (say, SMBH 
binary) Q. 

It is possible to maximize the likelihood analytically over two parameters and we will call the resulting function 
Maximized Likelihood (or quality) . The procedure of maximization is similar to the one used to produce the ^-statistic 
[29L l45j|. Due to amplitude modulation we can only maximize over two parameters: the luminosity distance Dl and 
the phase at coalescence $ c . 

We will be working with the logarithm of the likelihood (getting rid of the normalization factors which does not 
depend on parameters by adopting the relative likelihood): 

1 z / 

here the sum is over the independent detectors (TDI streams, / = {A, E}). The GW template (fT6|) can be express as 

hi = ^2 a k hki (26) 

k 

and the extrema of In C over ak are found by solving the coupled set of equations 

<91n£ 



da k 

and it turns out that 



= (27) 



= X k (M kl )- 1 withX k = J2(s\hki), M k i = Y,( h ki\hu). (28) 



The log likelihood maximized over ak is called the ^-statistic in the case where a k are four functions of (constant) 
amplitude, polarization angle, inclination and initial phase: 

T~ ^XkiMki^Xt. (29) 

In the case of spinning MBH binary, the analytic maximization is possible only over the luminosity distance Dl and 
the phase at coalescence <fi c . Consequently, the dimension of the search parameter space is reduced to 13. Following 
(f2"6f the template (jTSJ) can be expressed as 

hi{t) = <zi hu(t) + a 2 h 2 i(t) (30) 

with 

ai = cos{2<t> c )/D L { hu = hj{D L = lGpc; 4> c = 0) 



a 2 = sin(20 c )/Di ' \ h 2 i = hi(D L = lGpc; <f> c = tt/4) (31) 
Using the above expressions and the orthogonality h 2 i — i hu we obtain the maximized log likelihood. 

T „ (Ej < >) 2 + (£/ < fjj^J >) 2 f32) 

(E/ < h u \h u >) 2 
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2. Maximization over the time of coalescence 



In order to efficiently find the time of coalescence, we use correlation in place of the inner products. Given a 
template h which is constructed with the initial value (usually taken at the edge of the prior) t c0 and using the 
inverse Fourier transform, we find the value of r max which maximizes (|32[) or which is equivalent to maximizing 

c ( r) = 2 r df mm^nm e -^. (33) 

Jo Sn(f) 

Note that the amplitude of the signal depends on the choice of t c via annual modulation caused by LISA's orbital 
motion, therefore the new value t c ,i = t c .o + r max is not necessarily the final answer. The time of coalescence which 
maximizes the quality Q32p for given other parameters should correspond to maximum of (|33[) at zero (or almost zero) 
lag. Using the new value of t c we repeat the maximization, and we stop iterations when the difference \t ci — t ci _i\ 
is sufficiently small. Usually few iterations are sufficient to find t c which maximizes the quality. 



3. The waveform termination 



The signal from MBH binaries is band limited, the lower frequency limit is defined approximately by twice the 
orbital phase at t = 0. 

The upper frequency is introduced somewhat arbitrarily. To terminate both the signal and the template smoothly 
an exponential taper is applied. The taper affects the data when two black holes are separated by a distance R = 7M 
and kills the signal completely around R = 6M (which is the last stable orbit for the test mass in Schwarzschild 
space-time). Therefore, in computing the overlaps, we use the maximum frequency in the integration corresponding 
to the orbital separation 6M: 

1 77 3/5 

• /max nM(R/M) 3 / 2 it{R/Mf/ 2 M c ' y ' 

The exponential taper causes problems for the long-wavelength approximation, and our template deviates from 
the signal during the last cycle. Unfortunately these small deviations fall in the most sensitive part of the LISA 
band and are further enhanced by high SNR. This causes a significant problem: the bias caused by this deviation is 
unacceptably large because there is a large region of the parameter space that produces templates which fit the end 
part of the signal perfectly (using incorrect parameters) but fail to reproduce the low frequency part of the signal. 

In order to solve this problem we terminate the template waveform few cycles earlier by fixing cutoff frequency which 
corresponds to the orbital separation R > 7M . Our approximation becomes better as we go to lower frequencies, 
however we start losing power of the signal (SNR) which is highly undesirable. We automatically readjust the frequency 
cut-off if the SNR drops below a certain threshold. 

We want to emphasize a very important feature which accompany the earlier termination of the waveform. The 
map of the quality changes: in the Figure [T] we show the map of the quality in the "chirp mass" - "eta" plane keeping 
other parameters fixed to their true values. On the left panel we show IFf u u (we use no frequency cut off other that 
introduced by the taper), and, on the right panel, we plot T cu t with template cut at f max — f cu t = 0.26 mHz. One 
can see multiple maxima in both plots, but(!) the position of the secondary maxima are different whereas the location 
of the true (global) maximum (indicated by an arrow) is the same. It can also be seen that the size of the secondary 
maxima on the right panel is smaller. We will use these features later in our search. 



4- A-statistic 



Chopping the template at lower frequency solves the problems mentioned above but is not completely satisfactory. 
We lose some SNR and consequently some accuracy in the parameter estimation, we also lose information stored at 
the end of the signal which is especially important to recover spin-related parameters. In order to reduce the impact 
of the coalescence part, without killing it completely. For that, we introduce a new function, called A-statistic which 
is simply a geometrical mean of the Maximized Likelihood of the cut waveform and the Maximized Likelihood of the 
full waveform: 

A = V-^cut x J" full . (35) 

A-statistic is not log likelihood anymore, but one of its advantages is that it keeps the information from the full 
waveform including the coalescence but at the same time it enhances the information coming from the low-frequency 



8 




FIG. 1. Distribution over M c and rj, of the Maximized Likelihood (quality) computed with the full waveform on left panel and 
with the waveform cut at / ram = 0.26 on the right panel. This example corresponds to a signal with the following parameters: 
P = -0.38896 rad, A = 3.28992 rad, t c = 19706568.3273 sec, M c = 1589213.34 M , t] = 0.23647, 8 L = 2.78243 rad, 
4>l = 1.53286 rad, X i = 0.24115, X 2 = 0.16145, S i = 1.20839 rad, <j> S i = 5.61808 rad, 6 S 2 = 0.39487 rad, (j> S 2 = 5.82937 rad, 
Dl = 6856164697.8 parsec, <f> c = 4.96746 rad . The arrow points to the true parameters. 



part. A-statistic also reduces the number of local maxima as can be seen in the Figure [51 In this example we have 
reduced the size and number of maxima from five to three. 




FIG. 2. Distribution of ^4-statistic over M c and r\. This example corresponds to the same signal as in Figure [T] The arrow 
points to the true location of parameters of the signal. 



III. GENETIC ALGORITHM 



A. The basic principle 



In order to find all the parameters of the signal, we need an effective algorithm to search over the 13 dimensional 
parameter space. Building the grid in the multi-dimensional parameter space is a highly non-trivial problem. The use 
of the stochastic/random bank [4^il - l49l | is a feasible method for the template placement, however a full grid scan over 
the whole parameter space would be prohibitively computationally expensive. Alternative would be to use variations 
of the Markov chain Monte-Carlo [2|| or nested sampling [3(| methods. Here we have chosen to use genetic algorithm 
(GA) (adjusted to our needs) to search for the global maximum of the likelihood in multi-dimensional parameter 
space. 

The GA is derived from the computer simulations of the biological system, which were originally introduced by 
Professor Holland and his students in Michigan University. It is a method for the global search (optimization method) 
based on the natural selection principle - the basis for the evolution theory established by C. Darwin. In the nature, 
organisms adapt themselves to their environment: the smartest/strongest/healthiest organisms are more likely to 
survive and participate in the breeding to produce the offsprings. These two processes, selection and breeding, are 
used in genetic algorithms to produce a new generation of organisms. Since the best organisms arc more likely to 
participate in breeding, the new generation should be better than the previous one (at least no worse). So this 
procedure induces the evolution of the organism, just like in the nature, the good qualities of the parents can be 
transferred to their offsprings. In the biological world, besides these two basic operations, among every generation, 
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there are always few individuals which have better characteristics to adopt to the environment, produced as a result 
of a positive mutation. By introducing the new genotype into the population, mutation can potentially improve the 
forthcoming generations and consequently accelerate the evolution towards the global maximum. 

Some measure of "goodness" needs to be associated with each organism. In the case of gravitational wave search, 
it is natural to associate the logarithm of the likelihood (or any other equivalent detection statistic e.g. Maximized 
Likelihood or A-statistic discussed in the previous section) with the "goodness" which needs to be "improved" through 
the evolution of the organisms. We will call the chosen measure of "goodnccs" , the quality of an organism Q. 

Following is a brief description of how a typical GA works. We start with a randomly chosen group of organisms 
(templates), we evaluate the quality of each organism (log likelihood). We select set of pairs (parents) based on 
their quality, the organisms with better quality (templates with higher likelihood) are chosen more often than weak 
organisms. We combine genotype of two parents to produce a child (we combine parameters of two chosen templates 
to produce a new one) . Number of produced children is equal to the number of parents (we keep number of evolving 
organisms (generation) fixed). Next we allow with a certain probability a random mutation in the children's genes 
(with some probability we randomly change the parameters of the new templates, exploring a larger area of the 
parameter space). The parents are discarded and the resulting children form a new generation. We repeat the 
procedure until we reach steady state (maximum in the quality). In this simple example we keep only one generation 
active (one group of templates). 

A list of (biological) GA terms with the equivalent terms in GW data analysis is given in the Table |U 



Genetic algorithm 



GW search 



organism 
gene (of an organism) 
allele (of a gene) 
quality Q 
colony of organisms 
n-th generation 
(selection + breeding) + mutation 



template 
parameter (of a template) 
bits (of the value of the parameter) 
Maximized Likelihood or ^4-statistic 
evolving group of templates 
the state of colony at n-th step of evolution 
way of exploring the parameter space 



TABLE I. Relation between GA and GW notions. 



In the following subsections we give a detailed description of each element of the basic GA and then we introduce 
the specific modifications to speed up the search. 



B. Code of the gene 



As we have discussed above, every organism is associated with a template and the parameters of the template play 
the role of genes. So each organism is described by 15 genes, two of them are chosen optimally (maximization of 
the log likelihood, see Sec. Ill B 1[) and the gene corresponding to the time of coalescence is efficiently found using 
correlation (see Sec. Ill B 21) . We imitate the DNA structure by describing the gene (parameter value) by a set of 
alleles. In our implementation we adopt a binary representation of the gene (parameter) which means that each allele 
(bit) has two possible values: or 1. In practice we first fix the precision of each parameter (by fixing the number 
of significant digits in the decimal format) and then we translate it to standard binary and/or in Gray form. In our 
method we use both representations, the reason will be explained later when we discuss quantization issue. 

Let us show how this is done in practice. Consider a parameter 9 k with the uniform prior range [xk, m in, Zfc.max]- 
First we convert a value x k of 9 k into an integer c k = (x k - x fejmm )/Aa; fc where Ax k = (x ktmax - x k , min )/2 Nk is the 
resolution of 9 k and N k is the number of bits. Then, we convert c k into the set of bits b k [i] using the coding rule 
of the chosen representation. As we see, the resolution for each parameter depends on the number of bits N k used 
for describing it and is the same for both representations. The importance of the bit is determined by its position. 
A change of a bit in a higher position (significant bit) corresponds to a big change in the parameter value. In our 
convention, the first bit, b k [0], is the lowest significant bit and the last bit, b k [N — 1], is the highest significant bit. 

There is a close relationship between two gene representations. We can transform the binary representation to Gray 
representation by the following procedure: given a string of binary code with N bits {-B[0], B[l], • • • , B[N — 1]}, we 
set B[N] = then the Gray code with the same N bits is 

G[i}= B[i + l]AB\i], (36) 

where the operator "A" corresponds to the XOR operator in programming languages. Other way round, by setting 
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again B[N] = 0, we can get the binary representation with N bits from the Gray representation as 

B[i] = B[i + 1] A G[i]. (37) 

C. Selection 

The selection process chooses the parents for breeding. The probability of selecting an organism is defined by its 
quality. Organisms with higher quality have a better chance of being chosen to participate in the breeding. First 
the quality Q%, i.e. the Maximized Likelihood or A-statistic, for all organisms is computed (index i refers to the i-th 
organism). Then each organism is assigned the probability of being chosen for breeding as pi = Qif%2j Qj- The 
selection is made by the roulette selection method: we choose a random number uniformly from [0,1]; if it is bigger 
than pi and smaller than Pi+i, then the i th organism is selected. This selection ensures that the "good" organisms are 
chosen more often than the "bad" ones and guarantees that the genotype responsible for a high quality propagates 
in generations approaching the optimal value. 

In our approach we do not take into account the geographical proximity between parents (in other words possible 
correlation between templates in the same generation). By forbidding the breeding between the correlated parents, 
it might be more efficient to explore a larger region of parameter space, but the overall resolution of the method will 
be reduced. We therefore do the selection based only on the quality. 



D. Breeding 



After selecting the parents, we need to produce the new generation, this can be achieved through "breeding". 
Breeding is the rule according to which a child is produced from the selected parents. The genes of the child are 
constructed by mixing the corresponding genes of each parent. We take one part from the first parent and the other 
part from the second one. Depending on which parts are chosen, there are several types of breeding. We usually 
use three different types: cross-over one random point, cross-over two fixed points, and random. For the cross-over 
one point, we choose one bit (denoted by i) randomly as the cross-over point and the child's genes are created by 
combining the first i bits of the genes first parent with the last N — i bits of the genes of the second parent (see the left 
panel of the Figure [3]). For the cross-over two fixed points, the genes of the child are built from three equal parental 
parts (see the middle panel of the Figure [3]). In the random breeding, each child's bit is chosen randomly from the 
corresponding bits of the parents (see the right panel of the Figure [3]) . 
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FIG. 3. Examples of used breeding: cross-over one random point on the left, cross-over two fixed points in the middle and 
random on the right panels. 



E. Mutation 

The first generation is chosen randomly by drawing parameters uniformly within the priors specified in [33|. The 
chosen selection implies that the quality of our organisms is likely to be increased with each generation. But, if we 
use only these two processes, the range of resulting genes is quite restricted: it totally depends on the initial random 
state and is just a combination of the parts from the first generation. The combination of genes and therefore the 
exploration of the parameter space is very limited and completely dependent on the initial choice. This undesirable 
feature can be cured by introducing mutation. 

Mutation in GA works in a way similar to how it operates in the nature. Mutation is a random change of few alleles 
in a gene of an organism; in our algorithm it corresponds to changing few bits in a representation of a parameter value 
of a template. The probability of mutation is called the probability mutation rate (PMR). We mutate each gene of 
each child independently and there are several types of mutation. First we need to decide whether we mutate a gene 
or not, and, if yes, we need to decide on the mutation rule (how we do it). The first possibility is that we always 
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mutate the gene and mutation is applied to each bit of gene independently. Each bit is flipped with probability PMR. 
The second possibility is to mutate a gene with probability PMR. In this case we have used two different rules to 
mutate the gene: (i) we flip N randomly chosen bits (ii) we flip N adjacent bits. Different types of mutations together 
with the value of PMR define the exploration area of the parameter space. An example is shown in Figure^! in which 
we start with PMR = 0.5 at the beginning of the search (left panel, one can see that the templates are scattered all 
over the space) and then slowly reduce it to PMR = 0.01 (the right panel). The true solution is located in the center 
of the blue circle. We will come back to PMR again in the Section HV A 31 





FIG. 4. Example of distribution of 100 organisms in two dimensions space which is the sky position (/3,A). The left panel shows 
the case of large PMR value (0.5) which corresponds to the beginning of the search. The right-panel shows the case of small 
PMR value (0.01) which corresponds to the end of the search. The best organism as well as the true solution is at the center 
of the blue circle. 



F. Tuning the algorithm using code, breeding and mutation 



In order to get comprehensive understanding of all kinds of representation, breeding and mutation to tunc our 
search, we did the systematic experiments to test one degree of freedom at the time. We fixed GA configuration and 
allow only one parameter to vary and analyzed the results of the search. We have tested PMR, number of organisms 
in the generation, gene representation (binary, Gray, alternating) type of breeding and mutation. We would need a 
separate paper to summarize this study, it is not our intend here, we will just give a small example below. 

We found that alternating the binary and Gray representation is more effective than using only one of them. We 
characterized different types of breeding and mutation according to the resulting exploration area of the parameter 
space. The result of three such combinations is summarized in the Table HU Based on this result we decided to 
start the evolution with exploration of the large part of the parameter space (BC01R-MNR8), then continued with 
BCOIR-MA, and finally, as our algorithm converged to the solution we explored small area around the best point 
intensively (rcfinment with BR-MNA8). 



Name 


Combina 

Breeding 


tion 

Mutation 


Width of ] 
large area 


Exploration 
local area 


BC01R-MNR8 
BCOIR-MA 
BR-MNA8 


Cross-over one random point 
Cross-over one random point 
Random 


N bits randomly 
Each bits independently 
N adjacent bits 


Greatly 
Yes 
No 


No 
Yes 
Greatly 



TABLE II. Impact of different types of breeding and mutation on exploration of parameter space. 



IV. ACCELERATION OF GENETIC ALGORITHM 



We have introduced above three fundamental concepts used in any GA (selection, breeding and mutation), which 
might be sufficient for a simple search. However, in our case (multi-dimensional parameter space with many local 
maxima) it might require a large number of iterations with the possibility that the end results might correspond to 
a local maximum. To reduce the required number of iterations and to increase the stability and efficiency of the 
algorithm, we introduce several accelerators which are used in our search. 
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A. Standard accelerators 



In this part, we describe the standard accelerators used in GA. 



1. Elitism 



Selection and breeding do not guarantee that the next generation will be better than previous one. If we completely 
replace the old generation with the new one, it is possible that we might lose the organisms with best quality. The 
overall tendency (trend of the evolution) is to increase the quality, but it can go down over some short period of time. 

The elitism (or cloning) is a simple way to maintain the quality across generations. If the best quality of the new 
generation is lower than the best quality of the current one, the best organism is propagated to the new generation. 
It is possible to clone one or several best organisms into the new generation. The elitism stabilizes the GA and 
guarantees the convergence of the algorithm. 



2. Simulated annealing 

The simulated annealing method has been already employed in LISA data analysis [29^ and proven to be very 
useful. In this method the smoothness of the quality surface is controlled by the introduced temperature parameter. 
If the temperature is high, the quality surface is very smooth and nearly all the organisms (good and bad) can be 
selected for breeding with a similar probability. If the temperature is low, the quality surface is highly peaked around 
the maxima and only the best organisms can be selected. Usually, a high temperature is selected at the beginning of 
the search to have a large area of exploration. The temperature is decreased as the solution is approached. 

Temperature is introduced in the selection process through the probability of selecting an organism. We set this 
probability according to pi = qi/ y~) j Qj where we have the quality of each organism redefined by the introduction the 
temperature parameter T as follows: 

[Qi Qbest) /oo\ 

Qi = CX P f » ( 38 ) 

where T is the temperature, Qi is the quality of i-th organism and Qbcst is the quality of the best organism. One can 
see that all qi are similar if temperature is high. 

We devise several kinds of annealing. A standard type is the cooling: the temperature evolves from the initial 
temperature Tj to the final temperature Tf as follows: 



(39) 

where n is a generation number and n c is the duration of the cooling (in number of generations). The values of T, 
and Tf are not known a priori. An alternative approach to control the temperature evolution is to relate it with the 
quality of the current generation. The temperature is then evolved according to 

T = (—) with p th = p if p < p th , (40) 
\PthJ 

where p = \ / 2Qi }est (which is the SNR of the best organism if we use log likelihood as a quality Q) and g and pth are 
two additional parameters. Usually we use g = 2 which corresponds to the thermostated annealing introduced in [29l |. 
In the beginning we keep the temperature equal to unity, and a high PMR is used to explore the parameter space 
and build up the SNR. On reaching p t h, heating is switched on to increase the exploration area by smoothing the 
likelihood surface and to force the colony to search for a higher maximum. Periods of high temperature are alternated 
with periods of low temperature (in a periodic manner), this way the regions around the local maximum and the 
global parameter space are explored in turn. 




3. Evolution of PMR 



As mentioned above, another way to control the volume of exploration is by varying the PMR (see Section MI E|) . 
Usually we start with a large value for the PMR (about 0.2), which is then gradually decreased to give more importance 
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to the breeding. In the end, the search becomes stationary and approaches the true solution, so the PMR needs to be 
quite low (usually we decrease it down to 0.01). The typical spread of the organisms in the beginning of the search 
is depicted in the left panel of the Figure |4] and we slowly evolve it towards the right panel by decreasing PMR. 

The three most frequently used types of PMR evolution in our analysis are (i) cooling, (ii) fixing and cooling and 
(iii) genetic genetic algorithm with PMR. 

In the first case of cooling, the PMR evolves from the initial value PMRi at generation n = to the final value 
PMR/ at generation n = n c according to 



[ PMR/ 



PMR=l PMR f{^) n ^ (41) 



otherwise. 



In the second case of fixing and cooling, at the beginning, the PMR is fixed as PMR = PMR^ for m generations, 
then it is cooled to PMR/ in the next n c generations as follows: 

PMR,, if n < m 

™R={ _ (42) 



PMR/ J 

In the last case of genetic genetic algorithm, the PMR is treated as an additional parameter of each organism. 
The PMR parameter evolves (we search for an optimal value) by the genetic operations in the specified range 
[PMR min ,PMR max ]. 

We use all the above types of PMR, for each gene we specify its own evolution path. Some parameters converge 
to the true solution faster than other, and some spin related parameters have multiple solutions. We use the PMR 
evolution scheme which reflects the convergence of the parameter and uniqueness of the solution. 

Note that we control the exploration area by using both simulated annealing and PMR. Each of these performs 
somewhat differently. Simulated annealing acts on the quality of the organism and affects the selection procedure for 
breeding, thus it uses the combination of the initial genes without adding new. On the other hand the PMR changes 
the structure of each gene and therefore brings in "new blood" into the generation (creates new combinations). The 
best result is usually achieved by combining together PMR with simulated annealing. 



B. Accelerators specific for MBH search 



In this part, we describe the non-standard acceleration processes introduced by us and which utilize the properties 
of the signal and/or of the antenna beam pattern. 



1. Brother 



The source sky position is encoded in our model of the signal in the phase (Doppler modulation) and in the 
amplitude through the antenna pattern function. For low frequencies the Doppler term is weak and majority of the 
information is stored in the directional sensitivity of the detector. However the antenna pattern function given in 
expressions (JTTJ) and Q18p is symmetric with respect to the transformation /3 — > —j3, A — > X + tt (mirrored/antipodal 
sky position). This implies a possible degeneracy in the parameter space, and, indeed we observe a high value of the 
quality at the antipodal sky position, making it very difficult to distinguish between those two. 

In order to overcome this problem, we introduced what we call the brother of the clone. With each clone we 
associate one organism (brother) created by copying the parameters values from the clone and then changing a few of 
these value by following particular rules. In our application of the GA for black hole binaries, the brother explores the 
parameter space around the mirrored (antipodal) sky position of each clone. In a particular search, the best organism 
usually jumps between these two sky positions until it settles on the best solution in terms of the quality. 



2. Local mutation 



What benefit one can have from using binary and Gray representations of the same parameter? The reason lies in 
representation of two adjacent integers in the binary representation. Two close decimal values of Ok which differ only 
by A x (i.e. corresponding integers differ by 1), may differ by several bits in their binary code. For example, in the 
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standard binary representation, the separation between the gene value 011111 and 100000 is equal to the resolution 
Ax (i.e. minimal distance), but, as one can see, it is necessary to flip all the bits for making this small change in 
the parameter value. This problem can be solved in two ways: (i) by alternating the Gray representation (where two 
adjacent integers differ by one bit) with binary, and (i) by introducing the "local mutation" . Local mutation is a 
small (of order few Ax) random change in the parameter value which can push it across the boundary. Note that the 
binary and Gray codes have different bit boundaries, so the alternation between them helps in the global exploration 
of the parameter space, whereas the local mutation helps the organisms to cross a particular bit boundary and works 
locally. 



3. Fixing the significant bits 



During the test runs of the GA, we noticed that some parameters are very well estimated already after few hundred 
generations. For example, the time of coalescence t c can be found with high precision in less than 200 generations. 
By restricting the search range of these well estimated parameters, the search efficiency can be improved. We achieve 
this by fixing (freezing) the most significant bits of such parameters which reduces the allowed dynamical range. This 
significantly speeds up the search. Note that we might still keep the PMR for this gene high in order to have an 
efficient exploration of the restricted parameter space. 

Let us give an example how it works in practice. The Figure [5] shows a typical example of the chirp mass, M c , 
exploration in our search. This parameter is encoded using 20 bits. First 200 generations have no restriction and 
a large PMR is used so the colony explores the whole of the prior range. However the higher concentration of the 
organisms around the best one (depicted by a green line) can be noticed which reflects its high quality and, therefore 
proximity to the true solution. After the 200 th generation we fix the bits at a position higher than a randomly 
chosen number between 14 and 16. It means that the bits &mc[16], 6mc[17], £>mc[18], £>mc [19] (and sometimes 6mc[14], 
&Mc[15]) of all the organisms are fixed to the value of the best organism (1,1,1 and here). It shrinks the search area 
to [2138483.938, 2384509. 746] M which corresponds to 

lower boundary = M c . min + AM C (0 x 2 19 + 1 x 2 18 + 1 x 2 17 + 1 x 2 16 ) 
upper boundary = M c , mm + AM C (0 x 2 19 + 1 x 2 18 + 1 x 2 17 + 1 x 2 16 + (2 16 - l)) 

After the 600 th generation we try to restrict the range further by fixing all the bits starting at the position 8 th or 9 th 
(again randomly chosen), which corresponds to narrowing down the range AM C x 2 9 = 1922.106 ~ 2OOOM . Note 
that, we can still release the bits (or change the random range) during the evolution to check the robustness of the 
found solution. 
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FIG. 5. Example of the chirp mass exploration by the colony of organisms. The green points correspond to the position of the 
best organism. The separations shows the structure of the binary representation. The numbers on the right are values at bit 
positions listed on top. 
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4- Specific breeding and mutation 

As mentioned above in Sec. (jlll Dl and IIII E|) . different types of breeding and mutation have different properties 
(main difference is in the expioration area around the best organism). The genes (i.e. parameters) do not have the 
same rate of evolution during the search. For example, the time of coalescence and the chirp mass converge to their 
true values quicker than other parameters. We customize the evolution of each gene by fixing the significant bits in a 
similar manner to the example discussed in the previous section. We also alter the type of breeding and mutation of 
each gene, forcing the exploration range to be large at the beginning of the search and changing to the types which 
are more suitable for more intensive local exploration close to the end. 

5. Change of environment 

While mapping the log-likelihood we have noticed that the binaries that coalesce within the observational time have 
more local maxima than the binaries coalescing outside the observational time (this also was mentioned in [46[ for 
non-spinning BHs). This can be explained by the accumulation of SNR. Due to the shape of the LISA's sensitivity 
and the evolution of the signal's amplitude, the largest part of SNR comes from the last month of inspiral. In the 
Figure E] we give accumulation of SNR (scaled by the total SNR) for one of the signals analyzed in MLDC3.2. We 
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FIG. 6. Example of accumulation of SNR (MBH-1) as a function of frequency. The points with attached numbers show the 
time until the coalescence in seconds. 

plot the SNR as a function of frequency, we also show (numbers attached to the circles) the time left to coalescence. 
As one can see 60% of SNR comes from the last day and a half of inspiral. The above implies that we need to fit only 
the last day of the signal in order to get a large SNR (in case where we see the coalescence) . This is obviously can be 
done in many ways and this results in multiple maxima of the likelihood. If the coalescence is not observed then we 
need to fit a large number of cycles to accumulate the appreciable SNR, this is harder to achieve unless we are close 
to the correct solution. In this sense it is easier to find the weaker signal with the time of coalescence outside the 
observational time. We have utilized this fact in introducing the A-statistic which enhances the low-frequency part 
of the signal. 

We have also implemented the accelerator which we call "change of environment" . We put the colony in different 
environments and expect the fitter organisms to survive in a variety of circumstances. In practice we terminate the 
template earlier in frequency and evolve the colony for some time with chopped templates. By changing the frequency 
range we change the likelihood surface, the secondary maxima change the size and position, but the global maximum 
remains at the same position (location of the true parameters, as shown in the Figured]). We use this property to 
alternate between different environments. It helps to move the search away from the local maxima where it has a 
tendency to get stuck, and guides the best organism to the true solution. It forces the search to seek a better choice 
of parameters and can also be used to check for the convergence of the algorithm to the global maximum. 

A typical scheme used in our search is as follows: we start off with a full template and use the Maximized Likelihood, 
Ffutt as the quality, then we alternate the evolution between full and chopped templates ( "change of environment" ) 
still using Q = T. We finish the evolution of colony using A-statistic. 
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We should mention that the frequency annealing introduced in [29( helps not only to speed up the search, but also 
assists in moving between local maxima. The structure of the likelihood surface changes as the duration of the signal 
increases. 



V. MULTIMODAL SEARCH 



In this Section we explain how we modify the GA to explore multi-modality of the likelihood surface. As discussed 
in the previous sections, the quality surface have many local maxima. Several techniques (simulated annealing, PMR 
evolution, change environment, etc.) introduced above, help in finding the global maximum, but they all assume a 
single solution and, therefore, cannot help if there are several maxima of almost equal heights/amplitudes. 

Five spin-unrelated parameters (time of coalescence, chirp mass, mass ratio and sky position) can be estimated 
using the GA implementation described in previous sections with very high accuracy. The magnitudes of spins can be 
also determined in some cases quite well. However other parameters corresponding to the initial orientation of spins 
and of the orbital angular momentum are quite problematic. A typical situation is presented in the Figure [7J We 
color-code the quality corresponding to each initial orientation of vectors, it varies within 12% of maximum while the 
points arc scattered over the whole range. One can see several solutions which are very close in quality to the true 
one (depicted by a circle). The search for a single maximum will miss other peaks. Instead we want to explore all of 
them and, based on the likelihood of each peak, we can make a claim about possible multiple solutions. 
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FIG. 7. Example of the distribution of the best organisms from 196 runs of GA applied to the search for source MBH-3 of 
MLDC 3.2 (the third signal). The left upper panel shows the initial direction of spin 1, the initial direction of spin 2 is in 
the right upper panel and we plot the initial direction of the orbital angular momentum in the bottom panel. The color scale 
corresponds to the value of A-statisitc. 



The reason for such a degeneracy lies in the nature of the waveform itself. First of all these parameters are highly 
correlated, second, they enter the expression for the GW phase at higher post-Newtonian orders, and affect the phase 
and amplitude rather weakly. The later can also explain that we can determine the spins better if we observe the end 
of inspiral, where the contribution from the high order terms is appreciable. 

Another, and most natural, reason for multi-modality of the likelihood is the presence of multiple signals in the 
data. In the analyzed data set there were between 4 and 6 signals, but exact number was not disclosed. The signals 
usually have different SNR, the search converges to the signal with the largest SNR and explores the modes of this 
signal, other signals appear at the initial stage of the search (up to the point at which accumulated SNR of different 
signals is comparable). The main hint that we are looking at the multiple signals is different values for t c and M c : 
parameters which are determined most accurately. The strongest signal can be removed from the data to recover 
weaker ones. It is desirable at the end to refine all the parameters by using a super-template formed by combining of 
several signals. 

We want to define the mode associated with each local maximum and explore the parameter space in its vicinity. 
The basic idea of our Multimodal Genetic Algorithm (MGA) is to put a cluster of organism in each mode, to do so 
we use several clones. Each clone corresponds to a mode and all modes should have comparably high quality. We also 
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increase the size of the colony so that we keep the number of organisms per clone constant. The clones participate 
in the breeding often and attract other organisms and consequently exploring the neighborhood of each mode. We 
describe the implementation of the evolution later. 

The crucial point of the MGA is the choice of the clones, two conditions are necessary for an organism to become a 
clone. First, it should have a quality higher than a certain level. This level can be fixed arbitrary or defined relative 
to the best organism (for example Qdone > Q level = 0.8 x Qbest)- The second condition is that there should not be 
another clone on the same mode. For that, we define boundaries around a mode (i.e. the rule to separate the different 

modes) using the variances a\ (o c ione*j of each parameter 6k at the clone position 9 c i one . These variances correspond 

to the diagonal terms of the inverse Fisher Information Matrix (FIM) defined as 

1- = /— 

We refer readers to [Ho[ for details on FIM. 

For each generation we are choose all organisms with Qi > Q level as candidates to be cloned. Among these we 
select only the ones which form the independent modes: 



\0k,i — Qk,clonej | > Fg k \J o\ (0 c [ onej ) , (44) 

where index i refers to the candidates to be cloned, j to the selected clone and Fg k is a factor to control how large 
should be the distance between the two modes along the parameter 8k ■ We choose Fg k individually for each parameter 
and it varies between 15 and 50. This way we define the volume of each mode. 

There are two ways to evolve a colony with multiple clones. The first one is mentioned above, where we increase 
the number of organisms proportional to the number of selected clones (modes) and evolve the system using the GA 
described in previous sections. If qualities of modes are comparable we expect to have a fraction of organisms in the 
close vicinity of each clone, while the remaining organisms explore the space in between the modes. Once we have 
started the evolution, we keep the number of clones fixed. If another independent mode is found and its quality is 
higher than the lowest quality among the clones, than the weakest clone (lowest quality) is moved to the new found 
location. Note that we always attach a brother to each clone. 

The MGA described above requires a large number of organisms (we need to use at least 10 organisms per clone). 
This requires a specific implementation if we want to use a computer cluster. We use this algorithm but with a small 
(less than 10) number of clones. 

The second approach, which we used the most, disallows continuous communication between the modes. We perform 
several independents runs (evolutions) with a single clone. Then we analyze the end results of all runs and identify 
independent modes among them. We use these modes as clones for the next set of independent runs (evolutions) . We 
iterate this procedure until no new modes are found. In this approach the modes exchange information discretely, 
after each single run. This multi-runs MGA is described in detail below in Section IVl Bl 

VI. PIPELINE 

In this section we describe the chain of algorithms used to arrive at the final result presented in the following 
section. 

A. Pre-analysis by a time frequency method 

The GW signals from the MBH binaries are usually very strong and do not need very sophisticated methods to 
detect them, especially if we observe the end part of the binary evolution. However, it is more complicated if we 
observe only early part of the inspiral. Before analyzing the data with GA we looked at the time-frequency map of 
the data constructed using the Morlet wavelet transform (see Figure [5]). From this map, we can clearly identify three 
strong signals with the time of coalescence within the observational time and one weak signal with time of coalescence 
about 3 months after the end of observations (signals are pointed by arrows). As we have mentioned earlier and will 
discuss in detail later, there is one more weak signal which coalesces even later. This signal is low frequency and too 
weak to be seen by eye in the data. 

In producing submitted MLDC results, we split the data in three parts, based on the time-frequency analysis 
discussed in the previous paragraph. The first part contains the strongest signal completely and low frequency parts 
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(few first months) of all other signals. The second part contains two other coalescing binaries and parts of remaining 
weak signals. Finally the third part contains only the signals coalescing outside the observational time. An iterative 
approach could also have been employed where the strongest signal is found and then removed from the data which 
is then analysed to detect other signals. This process is repeated until no more signals can be found. Estimating 
the residuals after subtracting the detected signals presents a particular problem in this incremental approach. A 
disadvantage of our chosen approach is the lose of some SNR, but we can be sure of avoiding the corruption of the 
weak signals by residuals of strong ones. However, it turns out that in order to find the fifth (the weakest) signal we 
had to remove the fourth signal (the right most one in Figure [5]) due to the strong interference with the secondary 
maxima. 
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FIG. 8. Time- frequency representation of channel TDI A of MLDC 3.2. We plot the norm of the Morlet wavelets transformation. 



B. Multiple steps MGA search 



We should not forget that GA is a stochastic search method. We can be sure about the convergence to the true 
solution if the likelihood surface is smooth and uni-modal. Unfortunately it is not the case, we have implemented many 
tricks to get through the forest of the local maxima to find the highest peaks. As mentioned earlier, our algorithm 
found several solutions with similar values of the likelihood. The evolution can still end up on one or another of these 
maxima, depending on the initial state and the seed of the sequence of random numbers. This is the reason behind 
implementing MGA. We have briefly introduced the multi-runs MGA in Section [Vj here we give a bit more detailed 
description. 

In this implementation of MGA, the modes exchange information discretely. We start with runs of a "single 

clone + brother + 20 organism" evolution. We use all accelerators introduced in Section IIVI We call each of these 
runs "standard" (as opposed to the global MGA run/search). In the first step we explore the parameter space trying 
to find as many maxima as possible. We evolve each colony for 2500 generations, then we collect the results of all 
these evolutions and identify the modes associated with the best organisms as described in the Section [V] In our 
search we followed 50 best modes. We attach a colony and start another standard run for each mode, in other words 
we start iV™° de = 50 independent evolutions for a single clone plus colony. In this step each mode is either refined 
or migrates to a new location outside its boundary, with a higher quality. In addition we restart Ni^f n standard 
evolutions searching for more modes. At the end of this step, the results from the N^f e + N^f n runs are collected 
and a new set of modes is identified. We iterate the process until the 50 best modes do not change anymore. 

We found that number of strong modes depends on the parameters of the signal and therefore keeping the number 
of modes to be explored fixed is unreasonable. In the post-MLDC exploration we used a variable number of modes: 
keeping all the modes with the quality within 2% of the best one. 

Both standard and mode exploration runs have similar evolution and differ mainly by an initial state of the evolution. 
We always use simulated annealing and the temperature alternates between hot and cold phases, the threshold pth 
(see part IIV A~2j) . which regulates the temperature, decreases with the number of generation for both phases. The 
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number of organisms in each run is kept fixed at 20. For the standard run, the evolution of the PMR and of the 
type of breeding and mutation are chosen such that the exploration volume is high at the beginning of the search and 
low towards the end (i.e. the volume near the vicinity of the best organism is search more intensively). As evolution 
progresses we gradually fix the range of parameters in the following order: t c , M c , 77, sky position, amplitude of spins, 
and finally, the initial direction of the orbital angular momentum and spins. 

As mentioned above there is a significant difference in likelihood maps for the signals coalescing within and outside 
the observational time. This is reflected in the search strategy for those two types: due to fewer number of secondary 
maxima for signals with coalescence outside observational time, we used only moderate simulated annealing and 
change of environment and fixed the ranges of M c , t c much later in the evolution. 

After some iterations the modes reach the stable state: we do not see any new modes and the existing modes are 
settled at stationary positions (maxima). At this point we stop the run, and all the modes found constitute our 
solution. One can use a Bayesian approach to assign a probability to each mode by calculating the Bayesian evidence. 

VII. RESULTS 

We have previously shown (5l| that GA works very well in the case of the non-spinning MBH even without using 
the multimodal search. In this section we discuss the results of the search for spinning MBH binaries. We present 
here the outcome of the analysis of MLDC3.2. By the deadline we did not implemented the MGA in full and therefore 
we have below two subsections: In the subsection IVII Al we give the results submitted by the deadline, and in the 
subsection IVII Bl we present the results of the full scale MGA analysis (obtained after the deadline). The main 
difference is in the number of recovered modes and switching to the full LISA response at the end of the search to 
reduce the bias due to mismatch between response function used in the signal generation and the one used in our 
analysis. 

A. MLDC results 

The signals present in the data can be split in two types: the binaries with the time of coalescence inside the 
observational time and others whose coalescence happened outside the observations. The difference between these 
two types is in the number of local maxima, SNR and consequently in the accuracy of the recovered parameters. 

1. Coalescence within the observational time 

We have found three signals of this kind in MLDC3.2. In the MGA we restricted the search to only 50 best modes 
selected at each step. Among 50 explored modes for each signal, we have identified a small number of distinctly 
strong and comparable modes for the submission. After 14th, 8th and 7th iterations respectively of the multi-runs 
MGA search, we obtained five modes for the strongest signal with the shortest t c (srcMCl which is MBH-1 in MLDC 
notation), four modes for the second one (srcMC2 or MBH-3 in MLDC) and six modes for the weakest signal (srcMC3 
or MBH-4 in MLDC). 

The results are presented in the first half (first three rows) of the Table ILTI1 which lists the relative/absolute errors, 
global overlaps and quality for modes submitted in MLDC 3.2 for each signal (without the direction of the spins 
and of the orbital angular momentum) . These errors should be compared to the corresponding predictions from FIM 
which are also given in the Table IIIII in the row labelled as "True" . For the chirp mass, the errors for all the modes 
are similar to the ones estimated from the FIM. For others parameters, the errors are generally few times higher than 
predicted by the FIM. At least part of this discrepancy comes from the bias caused by the signal approximation - we 
have used the long wavelength limit which is valid for the low frequency part and breaks down near the coalescence. 
The mode with the error for the sky position higher than 175 degrees corresponds to the antipodal location on the 
sky. Taking this as a genuine degeneracy, we see that the source location is found with the precision higher than 10 
degrees for srcMCl, 5 degrees for srcMC2 and one degree for srcMC3. 

We found a strong degeneracy in the initial directions of the orbital angular momentum and spins, so we decided 
to submit several well separated modes. Only for srcMC2, one of these modes corresponds to the true parameter set. 
For srcMCl and srcMC3, the true mode was missed, however we found it in the full scale MGA analysis conducted 
after the deadline (see subsection IVII B( ) . 
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1.0 
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125.3 
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True 
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725.6 
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1.0 


184.99 


srcMC4 


1 
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MBH-2 
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10.47 
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251.4 
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3 
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409.0 


153.0 
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197.08 
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srcMC5 


True 


315.1 
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6.40 
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321.4 


1.0 


38.75 


MBH-6 


1 


1042.3 
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2.11 
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191.6 


260.5 
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2 


293.7 
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89.6 
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TABLE III. Relative/absolute errors, global overlap, O, and quality for the modes submitted in MLDC 3.2. All parameters 
are defined in III Al ASky is the angular distance in the sky between the true and the estimated positions. The second column 
gives the mode number. The errors for true parameters are obtained using the FIM. For the tree first sources (srcMCl, srcMC2 
and srcMC3) which coalesce during the observational time the quality corresponds to yl-statistic and the two others (srcMC4 
and srcMC5) which coalesce after the end of observation the quality is the Maximized Likelihood T . 



The last two columns of the Tabic Mil show the value of ^4-statistic (quality column) for each mode and the multi- 
stream overlap defined as 

o( § ) = (Mgg) I Mj + (Mgg) I h E (8 t )) 

Af[h(9 t )] N[h(§ e )] 

where 0t corresponds to the true parameters, 9 e are our estimated parameters and Af[h(8)] is the norm of the template 
h(8) defined as 

M[h(9)} = \J(h A (6) | h A (8)) + (h E (6) | h E (9)). (46) 

The overlap O varies between -1 and 1 (from perfect anti-correlation to perfect correlation) and it tells us the loss 
in the SNR = Af[h{6 t )] Q due to the mismatch between the signal and the template. The SNRs of sources srcMCl, 
srcMC2 and srcMC3 are 1670.58, 847.61 and 160.51 respectively. 

All of these modes have an overlap with the true solution higher than 99%. The value of A-statistic as well that 
of Maximized Likelihood for the recovered modes is higher than the corresponding values for the true parameter set. 
This is a manifestation of the mismatch between the signal and the template and indicates the importance of using the 
full response towards the end of the search. We were aware of this but did not have time to implement it completely 
before the MLDC submission deadline. Nevertheless, given this bias in the search, our results are still quite accurate. 



1 Here we have defined the theoretical SNR as an average over the ensemble of the noise realizations. 
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2. Coalescence beyond the observational time 

During the search we found two signals of this kind. The results are presented in the second half (last two rows) 
of the Table HTT1 Those are low frequency signals, so our long wavelength approximation works very well resulting in 
very small or no bias in the parameter estimation due to the mismatch between the response functions as discussed 
in the previous section. 

First we identified the source with SNR 18.63 (which is srcMC4 or MBH-2 in MLDC notations). For this signal we 
found several modes after 8 steps of MGA search, out of which we selected four modes with highest quality for the 
submission. From the Table IIIII it can bee seen that the errors in the spin independent parameters are similar to the 
errors predicted by the FIM. Spin-orbital and spin-spin couplings enter the phase at 1.5 and 2 PN orders respectively, 
and since we do not observe the end of the inspiral, these terms contribute very little to the phase as well as to the 
amplitude modulation (see the orbital frequency dependent term in equations {J}-©). Therefore the spin related 
parameters are intrinsically poorly identified for these sources which is reflected in our results. 

The fifth and the last source is the weakest. In fact it was completely contaminated by the secondaries of the 
srcMC4. In order to identify this source we had to remove the fourth signal. We identified the srcMC4 with the best 
(highest quality) recovered mode, generated the signal and subtracted it from the time series. After that we repeated 
the search and already the first standard run found the mode with SNR > 7 which was a positive detection. Before 
the deadline we could perform only 3 steps of the MGA, however this turned out to be sufficient, as is indicated by the 
overlap column in the Table Hill We have clearly identified two modes with the opposite sky positions. The SNR for 
this signal was 12.82 and consequently the parameters have large uncertainties. The initial directions of the spins and 
the orbital angular momentum could not be identified at all. Other uncertainties are consistent with the FIM. This, 
fifth signal, was correctly identified only by us among all the participants of MLDC3.2 (at least with the precision 
which gave an overlap of 0.92). 



B. Post-MLDC results 



After the MLDC submission deadline, we finalized the implementation of the MGA (this time we have kept all the 
modes within some fraction of the maximum) and have performed the search to completion. We have also incorporated 
the full LISA response in our template using LISACode [HD, [53| to refine the final solutions. We discuss the details 
below. 

For the few first steps we kept all the modes with quality higher than 50% of the best one, then we increased the 
mode selection threshold to 90% (or higher, depending on the number of modes detected for a given signal). We 
also improved the mode separation criteria by adjusting Fg k based on the detailed study of the quality distribution. 
Finally, we have also added two final search steps using the templates with the full TDI response. Here we used a lite 
version of LISACode simulator [521 [53j . The lite version contains some fine-tuned trick which allowed us to compute 
the two-years long template in less than 15 seconds. The final steps with the full TDI response are required only 
for the signals which coalesce within the observational time. Only those signals propagate to high frequency where 
the long wavelength approximation is not accurate any more, and the SNR is high enough this to matter. Including 
the full response also helped for srcMCl to promote the mode (increase its quality) closest to the true solution and 
slightly suppress the others. 

For the last, full response search, we selected the modes within 98% of the best one. This results in the selection 
of 26 modes for source srcMCl after 13 steps of MGA, and 175 and 17 modes for the sources srcMC2 and srcMC3 
respectively after 9 steps of MGA . Figure [9] shows the distribution of the Maximized Likelihood with the full LISA 
response relative to the true one, J^Fuii^l 'J~ Full, true, for the modes of the source srcMC2. For this source, 36 modes 
have SNR0 within one standard deviation of the SNR true : ASNR^ = |SNR,; — SNR true | < 1, and 4 of them have 
FfuII,i higher than for the true waveform. Deviations of order unity in SNR can be easily produced by noise, note 
that besides the stationary Gaussian instrumental noise we also had cyclo-stationary Galactic confusion noise. There 
are similar results for other sources. For srcMCl we identified 21 modes with ASNRj < 1 and 6 of them have 
FfuII,i > J^Fuii.True, for srcMC3, 21 modes have ASNRi < 1 and 6 of them with J-Fuii.i > J'Fuii.True- We confirmed 



Approximating SNR ~ \/1T we have the following relation between T ratio and ASNR : 



J^True V SNRtr 
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these results also with the signals generated using syntheticLISA [54(, the simulator used to produce the data set, to 
avoid possible error coming from the use of two different simulators. 
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FIG. 9. Ratio between Maximized likelihood of modes and true Maximized Likelihood for source srcMC2. The blue dotted 
line corresponds to a ratio equal to one. The green dashed lines correspond to ASNRi < 1 

The difference between identified modes is within the fluctuation that can be caused by the noise, therefore there 
can not be a unique solution. However, all the modes are single valued in the non-spinning parameters and they split 
for the initial directions of the spins and the orbital angular momentum (sometimes also antipodal sky location). 

Our post-MLDC results are presented in Table IIVI in which we list the parameter estimates for three modes found 
for each source. These modes are described as follows: 

1. 'B'-mode ('B'-est) is the the mode with the highest Maximized Likelihood value using the template with the 
full response. 

2. 'C'-mode ('C'-losest) is the mode closest to the injected signal in all parameters. 

3. 'A'-mode ('A'-strophysically relevant) is the mode with the smallest error in the most relevant parameters from 
the astrophysical point of view (sky position, distance, masses, spin amplitude and time of coalescence). 

We estimate the "closeness" to the true parameters as 



d — max). 




where a k is an error in the estimation of parameter 9k and a 'p IM is a corresponding prediction from the FIM. 

Besides better identification of the modes we have also improved the parameter estimation which is reflected in 
increase of the overlap to > 0.999 compared to 0.99 for the MLDC results. 



VIII. SUMMARY 



In this paper we have described the application of the Genetic Algorithm to the the problem of detecting gravi- 
tational wave signals from inspiralling spinning MBH binaries and estimating their parameters. We described how 
GA can be translated to the problem of GW data analysis, and introduced some custom-designed accelerators of 
the evolution which allow us to efficiently explore the 13-dimentional parameter space. In addition to the standard 
^-statistic which is popular in the GW data analysis, we introduced a new detection statistic called ^.-statistic, which 
enhances the low frequency part of the signal. Use of A-statistic allows us to partially compensate for the mismatch 
between the template and the true signal and to change the structure of the quality surface eliminating some of the 
secondary maxima. 
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0.82 
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1.0 
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srcMC2 
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13.6 


4.8 
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1.33 


2.8 


28.3 
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4.1 


0.999939 
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154.6 
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24.1 


3.5 
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138.7 


22.90 
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40.5 
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55.9 
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181.3 
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7.3 


0.999311 
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MBH-4 




17.7 


8.5 


234.0 
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0.998723 
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TABLE IV. Post-MLDC results. We present relative/absolute errors, global overlap, O, and Maximized Likelihood using full 
response J'Full f°r selected modes for sources srcMCl, srcMC2 and srcMC3. All parameters are denned in III Al ASky, ASi, 
AS2 and AL are the angular (geodesic on a sphere) distance between the true and the estimated direction for, respectively the 
sky position, the spin of MBHi, the spin of MBH2 and the orbital angular momentum. The selected modes are given in several 
rows for each source and marked as: 'A' corresponding to the mode closest to the true one in the most relevant astrophysical 
parameters (sky position, distance, masses, spin amplitude and time at coalescence); 'C corresponding to the mode closest to 
the true one in all parameters; 'B' is the mode with the best TfuU- The mode without any mark is the second closest. 



We have found that the likelihood surface is highly multimodal with several modes having very high amplitudes. 
In order to incorporate this in our search we have extended the standard GA to the Multimodal Genetic Algorithm. 
We cluster strong local likelihood maxima in the parameter space within the volume denned by the slightly enlarged 
error boxes predicted by inverse of the Fisher information matrix. To each cluster or "mode" identified in such a 
manner, we attach a colony of the organisms. The colonies explore their local regions intensively and exchange the 
information after every 2500 generations. 

We apply this method for the analysis of MLDC3.2 data set. In the blind search, we have successfully found all 
5 signals and the recovered solutions have overlap higher than 99.2% for the strong (high SNR) signals and higher 
than 93% for the weak signals. The results submitted by the deadline did not fully reflect the capability of our search 
method, as the implementation was not complete. We have completed the search after the deadline by allowing MGA 
to reach the stable solution. We have also used the full TDI response during the last two steps of our post-MLDC 
analysis. This has allowed us to recover all modes and reduce the bias in the parameter estimation due to use of 
the long wavelength approximation in our search template. We have achieved a remarkable accuracy in estimating 
non-spinning parameters, as well as reasonably accurate estimation of the spin magnitudes if binary coalesces within 
the observational window. Our method is at least comparable, if not better, to other very successful algorithms such 
as MCMC with parallel tempering and MultiNest [3(| . The success of the MGA in case of the inspiralling spinning 
MBH binaries gives us the confidence that it should also prove to be highly efficient in the search for Extreme Mass 
Ratio Inspirals (EMRIs). 
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